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Abstract 


Proliferation of personal compiiteis e\plosi\c growth of internet the changes in com 
munication services and availabilit\ of mevpensne hardwaie on dcsl top has pd\ed its 
way for group collaboration in distributed environments Ivlam computer tools ha\e 
extended then support for effectue group collaboration on Unix platform 

In this thesis we ha\e developed a PC based teleseminarmg tool SIideTalk on IVIS 
Windows platform SIideTalk allows the speaker and the audience to interact during 
the conference through PCs networked either on the same LAiN or on an internet The 
system software is designed complete!} under ObjectWindows environment and uses IP 
group delivery model for communication between members which works on top of UDP 
as the underlying transport protocol 

The complete implementation work for SIideTalk has been done m two paits Displa} 
of Slides and Voice Conferencing This thesis work deals with the voice conferencing 
component The speaker s voice is multicast in leal time to all the group members The 
complete development of SIideTalk has been done on top of W hiteBoard w hich prov ides 
the functionality to exchange graphics and multifont text m a multicast eiiv ironment 
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Chapter 1 


Introduction 


Relatn eh ine\pensu e hard\\ are for processing audio and \ ideo data are rapidh becoming 
a\ailable for desktop Personal Computers A.t the same time greater network bandwidths 
at lower prices is making its wa\ to the desktop These developments have stimulated 
interest and demand for distributed applications that require these new capabilities foi 
processing the streams of digitized audio and video data For example applications 
such as computei based conferencing multimedia mail and document piocessing have 
been developed to enhance users work environments and to foster a more effective group 
collaboration possiblv in a distributed setup 

In this thesis and its companion thesis [1] we have developed a remote semmanng 
tool ShdeTalk for the PC platform In the following sections we briefiv introduce the 
features of ShdeTalk the existing implementation and the features that we have added 
to enable ShdeTalk to become a useful semmanng tool 


1 1 ShdeTalk 

The goal of ShdeTalk is to enable tele seminarmg The seminar session is delivered with 
the speaker and the audience interacting through PCs that are networked either through 
the same LAN or through an internet A graphical user interface (GUI) provides them 
with a user friendly env ironment during the seminar 

In ShdeTalk all the information is multicast Multicasting is used to communicate 
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among a group and the members of the multicast group are identified b\ a unique ad 
dress kno\\n as group address (Net\\ork nodes ha-ve one or more multicast addresses 
in addition to their unicast addresses The multiple multicast addresses correspond to 
memberships in multiple groups ) Thus, before commencing a session both the speaker 
and the audience will decide upon a group address to which their session will belong and 
the time at which the seminar is to begin To join the seminar session the speaker or 
audience first submit the gioup address to the application to get a window of SlideTalk 
The SlideTalk window of the speaker and of the audience will offer the same menu 
choices but the speaker will control the utilization of some of these choices The menu 
choices available are slide (to displa\ a PostScript slide in the SlideTalk window) 
voice (to begin or end a one wd\ coicp communication) graphical annotating objects (to 
either annotate on a slide or to draw figuies on a blank screen) and text (tint allows 
a paiticipant to tvpe multifont multicoloured text) A.11 information generated at a 
paiticipant s machine is multicast to all the group members This means that all the 
members of the gtoup will alwats ha\e the same information displaied in their windows 
at all times 

The slide feature allows importing of a PostScript slide file fiom the GhostMew appli 
cation into the SlideTalk window This import information will then be multicast so that 
the same slide is opened m the SlideTalk window of all the group members It is assumed 
that all members of the group ha\e the slide that is to be imported m the same pith 
as in the speaker s machine For \oice communication the speaker s talk is multicast m 

real time 

The speaker controls the slide displai and \oice communication If the speaker opts 
for the slide no member in the audience would be able to open a slide Similarlv if the 
audio session is started bj the speaker the microphone of the audience is deactnated 

The graphical objects supported by the SlideTalk are ellipse rectangle straight 
line and curved line It also has a text editor For text the font t>pe, font size and 
font colour can be selected The graphics editor has a provision to select the colour and 
border thickness of the figures The text and figures can be drawn either on the blank 
pages or on the slides to annotate during the seminar 
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Only one menu item is accessible at any time except for audio One other option 
IS supported concurrently with audio to enable the speaker to continue tailing while 
importing a slide or while annotating on it 

The SlideTalk session data is multicast over the network Since the data is trans 
mitted only once to all the participants the bandwidth requirement on the network is 
not a function of the number of participants and SlideTalk can support anj number of 
participants simultaneoush Participants can log into oi log out of the session at anv 
time without disrupting the ongoing session The member who joins late can receive onh 
the data which has been transmitted after the joining time 

1 2 Previous Implementation 

The design and implementation of the WhiteBoard was done bv Gupta and hlukhopad 
hvav [2] Thev developed the \\ hiteBoard m which thev prov ided the basic functions for 
graphics and text Thev used multicast for communication among the group members 
The packetization of the data is handled bv the application The communication mode 
is free mode which in this case means that there is no limit on the number of members 
interacting at the same time 

^^hlteBoard also has options for font stvle font size and font colour A.s soon as 
some text is tvped on anv of the consoles a packet is made on a per character basis 
filling the fields like the character value the coordinates of the character and the font 
data Then the packet is multicast Similarlv if some graphical object (a curved line 
a straight line a rectangle or a circle) is drawn a packet is filled with information like 
type of the object, the pen size, the border thickness coordinates of the figure and the 
colour of the figure drawn 

The design of the WhiteBoard is object oriented Separate lajers have been designed 
to interface with Windows Sockets preparation and transmission of packets and starting 
of the session The highest abstraction allows the user to draw a graphical object or 
write text on the WhiteBoard 

Since the design of the WhiteBoard is object oriented, it is an excellent platform 
to build SlideTalk on top of it We have added features to it b> modifying its top 
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layers while the lowest laver that deals with multirasting of the information is retained 
from WhiteBoard Thus, we have enhanced the features of the WhiteBoard to realize 
ShdeTalk 


1 3 Our Implementation 

In this implementation we ha\e incorporated the two features of slide displa\ [1] and 
voice communication Primitne forms of audio and \ideo pla^ back ha-ve also been incor 
porated PostScript slide files can be imported into the window of ShdeTalk Other slide 
formats like the PowerPoint bitmap gif could also be displa\ed using a similar technique 
but has not been implemented The PostScript files cannot be displaced directh in the 
window and requires the Ghostmew application for displaMng it Thus to displa\ a slide 
the same slide is opened in the Ghost\iew application and then imported into the displa\ 
window The mam challenge is to import the data of one application to the other without 
loss of resolution When a slide gets displa\ed on the speaker s console the same slide file 
information is multicast to the group Lsing the graphics and text functions de\ eloped 
for the WhiteBoard the speaker can annotate the slide 

In the voice component we transmit a oice in W A.\ E format with 8 bit PCM encoding 
The audio buffeis are filled b\ the Sound Blaster audio de\ice dri\er and packetized \oice 
IS multicast The mode of interaction m the \oice mode is FIFO speaking requests are 
ser\ iced in order of arri\ al but onh one microphone is enabled at a time Here the mam 
challenge is to minimize latencj and loss of fideliti and also to keep the \oice session 
synchronized 

1 4 Organization of the Thesis 

In this thesis and its companion the two mam features of ShdeTalk - slideshow and 
voice communication ha\e been developed The rest of the chapters of this thesis have 
been organized as follows Chapter 2 describes m detail the WhiteBoard platform on 
which ShdeTalk was developed Chapter 3 deals with the design issues for the speech 
communication component and the implementation details Chapter 4 discusses the 
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testing of the implementation and also contains concluding remarks and future work 
The work of implementing of import of slides and of \oice trasnsmission was necessar\ 
to make SlideTalk a usable seminar tool To a\oid duplication of effort Chapters 1 and 
2 of this thesis and the companion thesis [1] are common 
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Chapter 2 


The Previous Implementation - 
WhiteBoard 


The WliiteBoaid implementation of Gupta and ]Mukhopadh^ i\ [2] uses a multicast fiainc 
work and feitiires exchange of giaphics and text information The motnation for the 
WhiteBoard design and features is from the woik of Van Jacobson [3] \ an Jacobson 1 1 al 
developed a framework foi scalable reliable multicast (SRM) The functionaht\ prmided 
m SRM is the eventual dehverv of all data to all group membeis without enforcing anv 
particular order Their framework has been prototyped m wb a distiibuted telesemi 
narmg application which maintains a shared window supporting graphics and text This 
application works under X-windows 

The framework of the WhiteBoard boriows heavih from the SRM work A.s in SRM 
WhiteBoard ensures eventuil dehverv of all the data to the entire group The specih 
cation of the WhiteBoard was such that global ordering of information dehverv was nut 
necessary Since WhiteBoaid was not designed to be a distributed coiifeiencmg package 
the requirements on red time dehverv were also not verv stringent The design however 
permits an application that is related to it to mcoiporate its own ordering on the packets 
exchanged m the group 
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2 1 The Framework for WhiteBoard 


The communication model foi WhiteBoird follows thf IP group dpli\fr\ model In IP 
multicast, corresponding to p\erv acti\e session there is a group addifss Data souriis 
simply wiite to the gioup address and recei\ers gathei diti b\ listening to the gioup 
address Individual members do not need to know the number of acti\p senders at wn 
instant or the IP address of other members This model adds fiinctionaliU to the IP 
model to ensure that the shared window is consistent among the menibers and that the 
members exercise some kind of ownership o\tr the dita thp\ create UDP [4] has been 
chosen as the underlying transport protocol because the IP group deh\er\ model works 
only on top of UDP Since there is no client sener mode of communication no centiali/ecl 
ordering of the objects is possible 

The design is object oriented [9] and consists of classes like client shape frame 
and sod et The shape class is luither subdnided into straight line rectangle ellipse 
and the curved line This cl iss is concerned with the display of the shape diti on 
the window The client class maintains the multicast session and iinoles handles of 
other classes (frame and socket) foi handling the task of receding packets and their 
transmission The frame class is concerned with packet formats and defines then fields 
The socket class deals with the raw data on the network b\ opening a socket 

The application is structured in a layered manner the Shape h\ei the Frame la\er 
and the Bytestream la\ei The Shape la\er interprets data in terms of shapes and main 
tains the objects displayed in the window The Frame la\ei is concerned with the genei \ 
tion of the application protocol frames and communicates with the peer la\ei to maintain 
a session The Bjtestieam lajer operates at the socket le\el [6] and interprets data in 
terms of bytes The detailed description of the functions of h\eis is evplaiiied in a later 
section 

The application leiel protocol is used b\ the gioup meinbeis to force a consistent 
interpretation of the data A. shape can be thought of as an object with the attributes 
of Type Identification Pen Size Colour and the Position on the screen The protocol 
interface to the Shape layer is in terms of objects with the abo\e attributes 

The IP group deh\ery model works on top of UDP Hence the protocol assumes loss\ 
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Figure 2 1 The Prcket Structure for Giaphical Object 

connectionless ser\ice fiom the underhing network Packets are multicast to the gioup 
whene-ver an active member puts up a shape m his window A.t the same time all the 
other members keep listening to their group socket and retd the data whenever it is 
available 

The packets are of fi\ed size (64 bytes) This choice of packet size does not waste 
too much bandwidth The data and control information for all the shapes is contained 
within 64 bytes There is a separate packet format for graphics objects and text These 
two packet formats are shown in Figure 2 1 and Figure 2 2 In the packet various fields 
have been defined The object type field differentiates between various objects The rest 
of the fields define the attributes of the object based on which the object can be properh 
ordered and displayed m all the peer windows 
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Packet Code 

Source Host IP Address 

Source Process ID 

Timestamp 

Sequence Number Local to Source 
Sub Sequence Number of the object 
Object Type RGB Colour Value 

Height of Character 
Width of Character 
Escapment of Character 
Orientation of Character ( 

Weight of Character 

-j — 1 

Italics Underline ] Strike Out Char Set 1 

i 1 

O/P Precision Clip Precision 1 Qualit> Pitch ^ 


Abscissa of Location 
Ordinate of Location 



Figure 2 2 The Packet Structure for Text Object 

Each user le\el object is uniqueh identified bv the users Host IP address users 
process ID and a sequence number local to the user The packets describing the same 
object are distinguished and ordered bj the sequence number The construction of objects 
IS controlled by passing some control information in the packets In the case of a straight 
line, a rectangle, or an ellipse since the object needs to be displaxed onh when its position 
has been entirely decided, a single packet is needed with the corresponding control code 
which has the two rele^ant points (these points denote the two end points m case of a 
straight line and in the case of ellipse and rectangle the\ denote the left-top and right 
bottom corners of the rectangle) However, in the case of the curved line the process of 
drawing should be visible on all the windows To achieve this a control code denotes 
the start of a curved line The points comprising the curved line are tagged with the 
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sequence numbers so that they can be indexed properly e\en if the'v arrue out of order 
Finally after the last point has been sent a special packet denoting the end of the cur\ed 
line IS multicast to the group Text is sent as a packet for each character along \ ith the 
relevant font information 


2 2 WhiteBoard Design Issues and Implementation 

One major issue m the design of the WhiteBoard was to proxide for concurrent update 
in anj part of the window A.m method of o\ercommg the concurrencv problem must 
necessarilj identify the packets uniqueh and provide the queueing of the packets so that 
the objects are displaced in the wmdou at the appropriate time Tvo major problems in 
incorporating this during multicast are the reflected packet problem and the simultaneous 
write problem Due to the asynchronous mode of communication each packet sent to 
the network is immediatel} receded back This is teimed as the reflected packet This 
leads to the shift of control to another message handler This message handler then 
identifies the packets to be its own and drops them A. similar problem occuis when 
two members write simultaneoush A.t the uppermost la\er this implies the presenct of 
more than one Graphics Device Context at the same time For each leceived mtssage 
the WhiteBoard allocates a device context updates the window using the message md 
then deallocates the device context For the local graphics however a separate device 
context IS maintained throughout the creation of the shape 

The ordering requirement for the WhiteBoard is not verv stringent The requirement 
is that the windows of different members be consistent more or less However, the local 
ordering is done so as to allow for the unique identification of each object The local 
ordering is achieved bj providing the objects with the sequence numbers Each host 
orders the objects for display depending on the order of their arrival A cross linked chain 
of host list and the shape list achieves this order In the cross linked structure it is assured 
that the objects corresponding to a particular host are m order, even if thev have arrived 
out of order Thus the only inconsistency that can be present among different windows 
IS regarding the order of placement of the objects created by the different members 


10 


2 2 1 The Layers 

The Bytestream Layer 

This layer communicates directly with the netw ork To make this laj er an independent 
identity, a SocketBase class has been introduced This provides the upper lasers with i 
handle to start one end of a session without being concerned about the socket options 
the network errois and maintenance The class Multicast Sod etBase is derived from the 
SocketBase class and it provides multicast support The upper lavers can start multicast 
sessions and perform network read or write operations using these lavers Further the 
support for reading and writing is entireh asynchronous and the parent window receives 
the messages that contain information regarding the connection (socket) that has the 
message to be read 
The Frame Construction Unit 

WhiteBoard has a separate frame construction unit that deciphers the packets re 
ceived from the network This unit is organized in such a wav as to minimize the overhead 
on the other lavers, and also to speed up the job This unit comprises both encapsulation 
of the outgoing data into frames as well as the retrieval of data from the incoming pack 
ets Outside this unit the treatment of frames is in terms of abstract entities that form 
part of the frames In essence this means that the frame construction unit understands 
all packet formats (and onlv this unit needs to be aware of all the packet formats) and 
the upper lavers get the required information from this laver This allows the unit to 
hide the conversion of data into network bvte order before placing it into the frame and 
conversion of data extracted from an incoming frame into the local bvte order before 
giving It to the unit that needs the data This conversion is necessar} to communicate 
with the WhiteBoard applications running on the platforms that support different bvte 

orderings 

The Frame Layer 

The frame layer primarily consists of the class Client which derives from the Multicast 
SocketBase class and extends its functions with a session maintenance functionality for 
the WhiteBoard The frame layer provides a handle to the upper la>er which mentions 
only the shape to be written to the network This layer calls the frame construction 
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unit to geneiate the frame and calls the handle of the b\testream la\ei to tiansniit tin 
packet The incoming packets are taken from the B\testredra lain The frame li\ei 
constructs the shapes from these packets and returns them to the upper laver 

The frame la}er also does the woik of session maintenance A.I1 the queues md lists 
of shapes are maintained here and this la\er performs the memors management for the 
WhiteBoaid by maintaining the queues 
The Shape Layer 

This IS the la} er th it provides the highest le\ el of abstraction and deals v itli shapes 
that are put up on the window and the operations pertaining to them A,ri abstrirt 
class called shape has been designed and from it the specific classes like sti light hiu 
rectangle ellipse curved line and text lia\e been deiiaed The shape la\Pi pisses on 
the shape objects to the frame lajer which m turn packetizes them and sends them 
over the network through the Bytestream layer A-t the leceners end the frame la-ver 
passes pointers to the shapes to the shape la\er which then displa\s them b\ calling the 
appropriate display function for the shape 

2 3 Conclusion 

Using layers and functions defined for them a modular structure of the \\ hiteBoaid was 
developed by Gupta and Mukhopadhvaj This design makes it easi to mcoiporate new 
features oi enhance existing features in the WhiteBoard because oiih one or two niodiih s 
(layers) may have to bi restructured with no changes m am other module 

We have used WhiteBoard as a platform foi developing ShdeTalk In ordei to mtoi 
poiate slide import and voice transmission features we have modified the top la\ers while 
the lower layeis that take care of multicasting of data are retained fioin WhiteBoaid 
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Chapter 3 


Design and Implementation of the 
Voice Component 


Our emphasis in this thesis \\ork is on supporting real tune communication to enable 
desktop conferencing on Personal Computers The approach that we can adopt for the 
real time communication of the \oice data depends on numerous factors like the archi 
tecture of the audio subsystem the interaction of the operating s\stein with the audio 
subsystem the available mtwork bandwidth and the degree of M A.C laver support in 
the network for real time communication 

In the present thesis work of mcorpoiating the voice capabilitj m SlidcTalk we fust 
discuss the design issues to be considered for supporting voice conferences Thereafter we 
describe our software oriented approach for implementing v oice communication facilitv 
and integiatmg it into ShdeTalk 

3 1 Design Issues for Real-Time Audio Conferencing 

The most stringent conditions for a high quaht} of presentation m a real time com- 
munication are presented by voice data This is because there is verv little tempoirl 
coherence [12] in the voice data which means that if any packet is missed or gets plavtd 
a multiple number of times it is easily noticed by the user In such a scenario the de 
sign for voice conferencing system requires a study of all the aspects concerned with it 
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the MAC layer support, the computational requirements and the buffer management 
strategies 

3 11 MAC Layer Support 

More important than the available bandwidth are the restrictions that multimedia ap 
plications, m particular interactive distributed multimedia applications impose on the 
underlying network There are local area networks [15] which ha\e a low bandwidth but 
do not support any priority for its traffic and mav be unfair m the distiibution of band 
width In high load situations the\ exercise no control o\er access dela\ or the a\ailable 
bandwidth per application On the other hand are MA.C la\er protocols that are fair m 
distribution of bandwidth have a large bandwidth and can set priorities for their traffic 
(e g FDDI) Some of them also support s\nchronous traffic b\ guaranteeing bounds on 
the access dela\ 

In an asynchronous network en\ironment (e g Ethernet) the wa\ the network is used 
by others cannot be controlled and therefore the delav and the number of discontinuities 
that will occur over anv inter\al of time cannot be bounded This can lead to jitter in 
the \oice communication and/or paclet loss These two issues are discussed below 

Jitter 

In order to sustain a high qualiU conference samples of \oice data must arn\e at the 
receiMng station so as to be pla\ed at a time that is at a constant offset from the time 
at which they were generated Mthough these samples can be generated at the desired 
rate by the audio hardware the exact rate at which the samples arrne at a lecener can 
be grossly distorted by poor operating system scheduling on the transmitting machine 
and ^arymg load m the network The problem is to ameliorate the effect of jitter (le 
variance in the packet inter arrival time at the receiver) in the arri\al stream 

Jitters in a transport sjstem takes place due to the variations in latencv For each 
audio packet, the total latency will have the following components 

• the time spent m the audio device (e g Sound Blaster) pipeline at the originating 
machine 

• the time spent at the netwoik interface waiting to access the network 
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the physical network transmission time 


* the time spent queued at the recei\ei waiting to get pla\ed out 

The mam factor m the contiol of latency is the network interface time Depending 
on the undeilymg network (MA.C layer) the jitter can be either bounded or unbouiichd 
Bounded jitters (upper bound on delay) are evpciienced on local iiea networks like 
FDDI which support synchronous traffic with the delaj limit coiihgurable at the iic twoik 
initialization time On Ethernet we obseii-e unbounded delivs 'Network traffic bursts 
increase latency and thus introduce jitters at the receuing workstation 

The effect of jitter can be reduced by using an initial playout dela\ Let r be tin 
playout delay measuied as the time between the instants at which the first sample was 
geneiated ind that at which it was played out Large \dlues of r can absorb laige \ahK s 
of Jitter Howevei a laige value of r affects real time communication especialh when there 
are more than one involved m communication Thus there is a trade off in the \alue of r 
that can be chosen to minimize the effect of jitter and to allow real tune communication 
However if the packets are geneiated on one LA.N and receued on mother L-VN via 
internet then even if both the LANs offer bounded delay unbounch d jitter ma\ still be 
experienced This is so because the underlying UDP/IP does not guarantee bounded 

delay 

Packet Loss 

As the load on the network increases packets maj be lost as the> pass through louteis 
or bridges Traditional data communication stria es to proaide reliable end to end com 
munication between two peeis Existing communication systems alwaas use checksum 
and sequence numbering for error contiol and some form of negative or positue acknowl 
edgment with packet retransmission for error lecoaera If checksum is not perfoimed 
either in hardware at the media access control (MAC) or the link laaer it can affect the 
system performance The (negative) acknowledgment with subsequent letransmission 
handshake adds more than a full round trip delay to the transmission of the data For 
trme critrcal data of multrmedra or conferenerng appheatron the retransmrtted message 

might thus be altogether useless 

The issue of reliability thus becomes eaen more complex for multipoint communi 
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cation Currently there exists no generallj agreed upon sclump for the semantics of a 
reliable multicast service Thus netwoiks simpl} procide unreliable multicast and leace 
the implementation of reliability to the higher communir ition ld\ers usualh at the ip 
plication 

3 12 Computational Requirements 

It IS quite essential to make a proper choice of audio processors as it is a \itdl factor 
for determining the load on the CPU when a conferencing application is gcung on A-t 
one extreme are the systems where digital or analog audio signals aie accimied b\ largeh 
autonomous audio processors that are capable of delivering the digital data diiectlv to 
an assigned buffer or device with little if any intervention b\ the CPU Examples of 
such a system are those with direct attachment of off the shelf teleconferencing codec s to 
workstations At the other extreme die computer sa stems where the CPU is intimate h 
involved in the control of the audio processors and in the commiimc rtion of the data 
between the processes contiolling the audio processors -An example of such a s\stem is 
the Sound Blaster 16 bit Audio Record system In this system the CPU at the sendei s 
end directs the allocation of a buffer supplies the buffer to the audio subsvstem collects 
it from the subsystem and then transmits it While at the recener s end the CPU leceues 
the buffer supplies it to the subsystem which then plavs it out and then returns the 
buffer to the application for initializing The effectne use of such an audio subsvstem 
requires that real time services be provided by the opei ating svsteni 

3 13 Buffer Management 

The trickiest data to handle for transmission is voice data and it requires proper buffer 
management to handle it A technique that can be used for transmitting and leceiving 
voice data is the double buffering method For reception one buffer is used to store 
the incoming data and the other buffer is used for playback Once the hrst buffei is 
filled or the audio playback of the second buffer has completed the two buffers will be 
switched The buffers for transmission of voice aie used in the similar mannei Here 
again, two buffeis are used for transmission of sound One buffer is used to store voice 
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data captured by the application When it is full the \oicp data is sent to the inultu ist 
address Meanwhile the other buffer is supplied to the audio deMce to store the ^ 0 KP 
data 

Theie is however, i drawback associated mth the double buffering method If the 
vaiiancc in latency is high such that it exceeds the packet generation time in such a c ise 
there is a high probability that packets will be lost This is so because if two cunsecutuc 
packets experience large latency and the following packet obser\es small latenc\ it might 
reach when both the biiffeis of receiving station are busj in such a scenaiio th it pad < t 
will get dropped 

3 2 Implementation of the Speech Component 

In this thesis we have implemented Voice Support for the ShdeTalk application Foi 
voice digitization and play out we use Sound Blaster 16 Bit ■\udio Record haxdwaie and 
Pentium PCs We assume that the network supports multicast UDP/IP Our work is 
based on the object oii< nted framework of WhiteBoard The lowest Id'ver in W hiteBo ird 
deals with raw data and supports multicasting We theiefore could deal diiecth with 
the audio issues xnd packet format for its transmission without worr\mg much about the 
network read or write operations In the upper lajers new objects ha\e been designed 
In the Shape layei a new shape called Voice has been developed and m the Frame 
layer a new packet format has been incorporated which supports all the required helds 
for this new shape The packet structure is as shown in Figure 3 1 

The mode of communication is FIFO (first in first out) In this mode the first person 
opting to speak would get the chance and the speech menu of the other members of (he 
group would get deactivated till this spexkei completes talking The mode has bten 
selected as FIFO because our audio subsystem requires intimate involvement of the CPU 
m real time for its control as well as foi communication of the audio data between the 
processes controlling the audio processors In such a scenario if more than one speikcr 
is active at a time, it siverely increases the CPU load 

In the FIFO mode if two speakers opt to speak at the same trim both would be 
intimated of the collision and as a result they would be required to opt out of the audio 
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Figure 3 1 The Packet Structure for \ oice Data 

mode and try after some time so that the collision gets resoKed 

When the speaker opts for voice communication a message is gnen to the WWE 
device (the Sound Blaster card) to open it m the record mode The W WE deuce of 
the speaker is supplied the W^A\ EFORkUT The W W EFORhf kT structure is as shoun 

beloM 

typedef wavef orinat_tag-[ 

WORD wForniatTag 
WORD nChannels 
DWORD nSamplesPerSec 
DWORD nAvgBytesPerSec 
WORD nBlockAlign 
}WAVEFORMAT 

wFormatTag specifies the format ty pe of audio encoding nChannels indicates the number 
of audio channels nSamplesPerSec specifies the number of samples to be generated pei 
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second nAvgBytesPerSec specifies the average data transfer rate m b\tes per second 
nBlockAlign is the minimum atomic unit of data i e one sample size 

We have chosen 8 bit linear encoded PChl encoding with the mono recording Each 
sample is, therefore, one byte and we collect 10 OOO samples per second Thus SlideTalk 
generates a traffic of lOKbytes per second for supporting the voice conferencing 

After the device is opened the buffers are supplied to it to record the voice data The 
opened WAVE device accepts data onlj in the WWEHDR format which m turn points 
to the buffer space for the data The WWTHDR structure is as show n below 

typedef struct wavehdr_tag-C 

LPSTR IpData 

DWORD dwBuff erLength 

DWORD dwBytesRecorded 

DWORD dwUser 

DWORD dwFlags 

DWORD dwLoops 

struct wavehdr_tag *lpNext 

DWORD reserved 

>WAVEHDR 

IpData points to the waveform data buffer dwBufterLength specifies the length of the 
data buffet dwBytesRecorded specifies the amount of data recorded m the buffer (for 
the input device) dwUset specifies 32 bits of user data dit Flags prorldes information 
about the data buffer dwLoops specifies the number of times to plat the loop The next 
two fields are reserved for future use 

Before a WAVEHDR data is guen to the WAVE device, it is prepared tor the record 
(in) mode and then transferred to the buffer of the W AVE dev ice VV hen the W A\ E dev ice 
fills the supplied buffer, the flag m the WAVEHDR is set Meant, Inle the application 
keeps looking at this flag when it finds the flag set, it empties the buffer associated with 

that WAVEHDR and packetizes its contents 

1 j eVior, ttiP indication of the start of the session arrives 

Similarly at the receiver s end, when the inaicauuu ui 

j hi p mode Again a WAVEFORM AT is gnen to 

the WAVE device is opened in the pKyout moac 
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the WAVE device mentioning the sample size and the sample rate at \hith we \ ant to 
play the audio received 

For a good response of the received voice it la essential that the WA.\ E device m 
the playout mode be opened with the same specifications as that of the sender s In the 
playout mode the WAVEHDR is filled with information like the pointer to the buffered 
data received and its size Then it is added to the W A.\ E dev ice W W E dev ice informs 
the application when it has plaved the voice data supplied to it with the help of a flig 

The trickiest thing in the voice conferences is the manner in v Inch the voice data 
IS transmitted This is so because we are working with asvnchronous packet switched 
networks (Ethernet), with no IvIA.C laver support for real time communication Ideallv 
the voice data should be transmitted as soon as it gets generated When the networl 
access time to send a frame becomes large than the frame generation time queues form in 
the transport svstem The length of these queues and the policies for m inaging them can 
strongH affect the throughput and the latencv of the conference For example if a buffi i 
cannot be transmitted when it is generaited enqueueing the buffer onlv serves to increase 
its latencj However if frames that cannot be transmitted immediatelv are discarded the 
throughput of the conference mav unnecessanlv decrease (with a corresponding mciease 
in the number of discontinuities) A. transport queue of length greater than one is useful 
onlv if the additional latencv it introduces can be tolerated bv the useis of the svstem 

The voice stream is not as susceptible to the effect of queue length because for the 
network that we are using a large number of audio samples can be transmitted in a single 
packet We use onlv two buffers which means that we can get a queue of at the most one 
at an} time This choice for two buffers has been made because our audio buffer (1000 
bytes) takes only 0 8 ms for propagation and we are transmitting just one packet per 
100 ms thus there is sufficient time to send a buffer while the other buffer is getting filled 
np Hence unless very high load conditions persist foi long periods on the network our 
conference can sustain maintaining high fideht}, and observing ver} few discontinuities 
Should the bandwidth available to the conference decrease below that required for the 
full audio sampling rate, more aggressive application level mechanisms such as changing 
the coding or compression scheme will have to be emploved to sustain the confer ence 
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W hile transmitting, om biiffei is usi d to storf \oirf datn r aptuif d b\ tlif niKiophoin 
Wlion it IS full thf dat i is attar hid -with a hr adrr and thin srnt to the inultK ast iddn s 
•\.t this tiriif tlip other bufffi is attar hr d \\ith tin W AA E rkiirr to itore tlir \oirr rl it i 
Thus the two bufftrs keep switching their loles The flo\ diayani of the srhrme is as 
shown m the Figure 3 2 

In our implementation we do not intiorluce am initial pi i\out deld\ We parkr ti/r 
this data b\ it t idling a hcirlri of 28 b\trs to it and thni multic ist it The size of 
the packet has been diosrn so tint on one hand it rloe^- not rontam a large amount 
of specrh dat i so as tri c uisr se\tie disrontiinutv m c isr of i i)a(k(t loss aiirl on tin 
othei hind it is not small enough to irriuiie nrtwoik ircrss moie fiffpirnth (whidi 
increases the probabihn of packet ddi\/loss) iiid incieases the GPL load to tiansmit 
the buffers In the work done b\ Ieffa\ ci al [13] tliev lecommcnd irdundant audio d it i 
transmission with packet size as laige as 10Kb\tes to support a high hdeliti audio iiid 
\ideo conference The use of this scheme drasticalh reduces the chances of packet loss 
and the discontinuities but comes at the cost of higher bandwidth lequircment 

The effect of the pi uoiit delac is to cushion the conference from the jitter caused In 
the burst} network traffic In oui implementation if a netwoik traffic buist of upto 100 
ms occurs m the network Slide Tilk will not incur am packet loss 

For reception of the \oice data we again use two buffers Paclcts arming from the 
network are processed b} the Fianie li\ei Loopback packets aie screened out when is 
the packets fiom the other hosts hue then htideis e\tiactecl The headci gucs the 
information like the sj:)eakei s IP address and the process ID A.ftci stiippmg off the 
header the \oice dat i is copied to the buffer deplosed for storing the data When the 
buffer IS full it is submitted to the audio output de\ ice The buffer pre\ioush submitted 
to the audio device is now used to store the incoming packet W hen the audio device has 
finished piocessmg the buffei whose address is pointed to bv the W^AA EHDR the device 
sets the relevant flag of AA AA EHDR The mam program processes the buffer after it sees 
the flag set The buffer that curreiitlv stores the incoming packet is then submitted to 
the audio device and the buffer returned bv the audio device takes its place to store the 
incoming packet instead The flow diagiam foi this scheme is shown in Figure 3 3 
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Signal b) the Sound 
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Figure 3 2 The Flow Diagram for Packet Transmission Scheme 
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Figure 3 3 The Flow Diagram for Reception of Audio Data 
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In the case of packet loss or delay, the buffer which is used to store incoming packets 
may not yet be full when the other buffer’s contents have been played out. In such a 
case, the two buffers form a queue to receive the incoming packets and on reception of 
the buffer once again switch to their normal roles. However, a jitter may be observed by 
the receiver. 

The data which is transmitted can easily be sent over a WAN (since the packet 
is transmitted using UDP/IP protocol) and can be received even by a system whose 
operating system has a different byte order architecture than that of the sending machine. 
This is achieved by writing the data in the network byte order in the transmitted packet, 
and on receiving the packet the data is converted into the local byte order of the receiving 
system. 

To cater to our requirement and to minimize the network traffic, we have chosen two 
sizes of SlideTalk packets. The first is 64 bytes - the packet for sending graphics and 
text information - it contains the required data and control information to recreate the 
objects. The other size is 1028 bytes - the packet for sending the voice data; in this 
packet the last 1000 bytes contain the voice data and the initial bytes contain the control 
information. 

For synchronization of the packets received, for example, on a LAX connected to 
the transmitting LAN via a WAN where the packets can be received out of order, or 
the packets of a previous session can also be received, the header control information 
is used. The header contains the IP address of the sender and the proce.ss ID (-which 
would differ for each conference initiated by this speaker). In the same session, when 
the speaker speaks at a different time a different sequence number is alloted each time, 
and all the packets getting generated are allotted monotonically increasing sub-sequence 
numbers. With the help of the sequence number and sub-sequence numbers the voice 
can be played out in smooth order at the receiver’s end because the receiving station will 
drop the packet if its sub-sequence number is less than that of the one being played out. 

We also looked into the aspect of multicasting the audio files’ contents. The wave files 
start with a WAVEHDR which informs about the size of the file. First the WAVEEDR 
is sent and then the rest of the data is packetized and sent. At the receiver end the same 
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information is put together once again in a wave file and then played out. 

3.2.1 Talker Indication: 

Since the delays between packet generation and playout can be substantial, it is inappro- 
priate to indicate the talker at the time of packet arrival at the receiver end. In our case, 
for example, we are releasing our first data packet after an interval of 100 milliseconds; 
if the talker indication is received by the other members at that time, then by that time 
few other members might also have pressed their audio menu, presuming that they are 
the only ones to initiate the conference. 

To avoid such a case, we send a control packet as soon as any one of the group 
members opts to speak. This packet informs the other members that soon the voice data 
shall be arriving. The control packet deactivates the speech menu of the recei^•ers and 
sends a control signal to the receiving station to open its W.WE device in the playout 
mode. The device allocates sufficient buffers to deal with the incoming data. Similarly, 
when the speaker finishes the talk a control packet is released which informs the group 
members about it and activates their voice menu once again, 

3.2.2 User Interface: 

The user interface presented is very similar to any standard application running under MS 
Windows. We provide a menu bar with options to select drawing and text parameters like 
pen thickness, font and colour; audio option can be selected by pressing the appropriate 
menu switch. The hot keys for the menu items have also been provided. There is also a 
tool bar that presents an icon-driven interface to these operations. 

3.2.3 Hardware and Software Requirements 

The system requirements are as follows: 

1. IBM compatible PC 

Cl”,: 

2. MS-Windows 3.11 (WFWG) 

123319 
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3 Sound Blaster card 


4 Ghostview for Windows 

5 Winsock 1 1 

6 Media Control Interface 

The SlideTalk software is available for Beta testing from Telematics lab^ IIT Kanpui 
In this thesis work thus, we have implemented the \oice Conferencing component 
of SlideTalk The communication mode during the conference is FIFO (the first person 
opting to speak will get the speech control and the microphone of the other membem 
will get deactivated) We have used double buffer method for transmission as well as 
reception of the voice data This model of SlideTalk will fit well for a tele semmariiig 
session where mainlv one speaker will be active at a time However through an interrupt 
the audience can request the speaker to pass the speech control 


^contact dmanju@utk ernet in or svs@utk ernet in 
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Chapter 4 


Testing of the Implementation, and 
Conclusions 


The SlideTalk m its present state of de\elopment supports exchange of ^olce data 
and slide file information along with graphics and text on a multicast framework The 
present framework howe\er works under certain limitations imposed b\ the CPL speed 
and underhing network bandwidth 

The following section discusses the testing of the implemenT ition Section 4 2 contains 
concluding remaiks and mentions the scope for future work 


4 1 Testing of the implementation 

In this thesis work the speech component of the SlideTalk has been implemented The 
implementation howeaer works under two limitations The first limitation is imposed b\ 
the present CPU speed 4s mentioned in the preaious chapter the CPU is mtimatelv 
imohed in the buffers’ suppl} to the audio hardware their transmission and acquisition 
and their recording and play out Thus CPU is hcaaila loaded during real time \oice 
conferencing Due to this reason we can haae the communication mode as FIFO onla 
and not free mode (in which the number of pgftrticipants discussing simultaneoush is not 
a constraint) In the free mode the CPU, with its present speed would not be able to 
process the receued data if the number of participants increases to more than 2 
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The solution to a'vert such a problem can be a silence deletion alj^onthm su<'h that 
no data be sent if a member is silent using such an algorithm the load on CPL \ ill 
come down drastically and would enable members to confer in the free mode 

The second limitation is quite related to the first one This limitation is imposed b\ 
the underlying network with no priorit\ for real time data On such a network even 
if the CPU is capable of handling free mode the amount of data generated in the free 
mode would require a preferred treatment (priontv) from the underhing network so as to 
avoid Jitter and minimize the discontinuities during real time conferencing Once again 
a solution like silence deletion algorithm will be a partial solution to this limitation 
This model can be best utilized during a tele seminaring session with a speaker 
enabled to talk and the audience able to listen in real time 

4 2 Conclusions and Scope for Future Work 

The SlideTalk maintains the lavered structure of the \MiiteBoard Each laver has been 
assigned a specific function to perform 

The sound component of the SlideTalk enables multicast of the speaker s voice to all 
the group members in real time The trickiest thing in the real time audio conferencing 
is the transmission and reception of the audio buffers We used a douole buffering 
method for paclet transmission as well as reception In such a scheme while one buffer 
is transmitted or received the other buffer remains with the audio hardware which fills 
in the voice data or plavs out its contents The two buffers keep switching their roles 
Using such a scheme we have been able to run our sessions for quite long times (more 
than two hours) without am perceptible loss of sj nchromzation 

The SlideTalk with its present features of slide and sound, along with graphical an 
notating objects is an excellent tool for tele seminaring The present framework could be 
easily used to incorporate a real time video conferencing feature, using a similar technique 
as in sound to transmit and receive the video buffers An information server can also 
be provided to enable members to retrieve the information even if thej join late or after 
the seminar session has ended A silence deletion algorithm can also be incorporated to 
enable free mode of communication during audio conference 
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